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Abstract 



We provide self-contained proof of a theorem relating probabilistic coherence of 
forecasts to their non-domination by rival forecasts with respect to any proper 
scoring rule. The theorem appears to be new but is closely related to results 
achieved by other investigators. 



1 Introduction 

Scoring rules measure the quality of a probability-estimate for a given event, with lower 
scores signifying probabilities that are closer to the event's status (1 if it occurs, 
otherwise). The sum of the scores for estimates p of a vector £ of events is called the 
"penalty" for p. Consider two potential defects in p. 

• There may be rival estimates q for £ whose penalty is guaranteed to be lower 
than the one for p, regardless of which events come to pass. 

• The events in £ may be related by inclusion or partition, and p might violate 
constraints imposed by the probability calculus (for example, that the estimate 
for an event not exceed the estimate for any event that includes it). 

*© 2007 by the authors. This paper may be reproduced, in its entirety, for non-commercial purposes. 
Research supported by NSF grants PHY-0652854 to Lieb, and PHY-0652356 to Seiringer. R.S. also 
acknowledges partial support from an A. P. Sloan Fellowship. 
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Building on the work of earlier investigators (see below), we show that for a broad 
class of scoring rules known as "proper" the two defects are equivalent. An exact 
statement appears as Theorem [TJ To reach it, we first explain key concepts intuitively 
(the next section) then formally (Section[3]). Proof of the the theorem proceeds via three 
propositions of independent interest (Section 2]). We conclude with generalizations of 
our results and an open question. 



2 Intuitive account of concepts 



Imagine that you attribute probabilities .6 and .9 to events E and F, respectively, where 
E C F. It subsequently turns out that F comes to pass but not E. How shall we assess 
the perspicacity of your two estimates, whic h may jointl y be called a probabilistic 
forecast? According to one method (due to Brier . 195Cll ) truth and falsity are coded 



by 1 and 0, and your estimate of the chance of E is assigned a score of (0 — .6) 2 since E 
did not come true (so your estimate should ideally have been zero). Your estimate for 
F is likewise assigned (1 — .9) 2 since it should have been one. The sum of these numbers 
serves as overall penalty. 

Let us calculate your expected penalty for E (prior to discovering the facts). With 
.6 probability you expected a score of (1 — .6) 2 , and with the remaining probability you 
expected a score of (0 — .6) 2 , hence your overall expectation was .6(1 — .6) 2 + .4(0 — 
.6) 2 = .24. Now suppose that you attempted to improve (lower) this expectation by 
insincerely announcing .65 as the chance of E, even though your real estimate is .6. 
Then your expected penalty would be .6(1 - .65) 2 + .4(0 - .65) 2 = .2425, worse than 
before. Differential calculus reveals the general fact: 



Suppose your probability for an event E is p, that your announced probabil- 
ity is x, and that your penalty is assessed according to the rule: (1 — x) 2 if E 
comes out true; (0 — x) 2 otherwise. Then your expected penalty is uniquely 
minimized by choosing x = p. 



Our scoring rule thus encourages sincerity since your interest lies in announcing prob- 
abilities that conform to your beliefs. Rules like this are called proper. (We add a 
continuity condition in our formal treatment, below.) For an example of an improper 
rule, substitute absolute deviation for squared deviation in the original scheme. Accord- 
ing to the new rule, your expected penalty for E is .6|1 — .6| + .4|0 — .6| = .48 whereas 
it drops to .6|1 — .65| + .4|0 — .65| = .47 if you fib as before. 
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Consider next the rival forecast of .95 for E and 
.55 for F. Because E C F, this forecast is incon- 
sistent with the probability calculus (or incoher- 
ent). Table 1 shows that the original forecast dom- 
inates the rival inasmuch as its penalty is lower 
however the facts play out. This association of in- 
coherence and domination is not an accident. No 
matter what proper scoring rule is in force, any 
incoherent forecast can be replaced by a coherent 
one whose penalty is lower in every possible circum- 
stance; there is no such replacement for a coherent 
forecast. This fact is formulated as Theorem [1] in 
the next section. It can be seen as partial vindica- 
tion of probability as an expression of chanceQ 





Forecast 


Logical 
possibilities 


original 


rival 


E = T 
F = T 


.17 


.205 


E = F 
F = T 


.37 


1.105 


E = F 
F = F 


1.17 


1.205 



Table 1: Penalties for two fore- 
casts in alternative possible real- 
ities 



These ideas have been discussed before, first by Ide Finettil ()1974l ) who began the 
investigation of dominated forecasts and probabilistic consistency (call ed coherence) . 
His work relied on the quadratic scoring rule, introduced aboveH Lindlev ()l982h 
generalized de Finetti's theorem to a broad class of scoring rules. Specifically, he proved 
that for every sufficiently regular generalization s of the quadratic score, there is a 
transformation T : 9t — > 91 such that a forecast f is not dominated by any other forecast 
with respect to s if and only if the transformation of f by T is probabilistically coherent. 
The reliance on the transformation T, however, clouds the interpretation of Lindley's 
theorem. 

Fresh insight into proper scoring rules comes fro m relating them to a generalization 
of metric distan ce know n asBregman divergence (jBregmanl . 119671 ) . T his relationship 
was s tudie d by I Savage! (119711). albeit im plicitly, and more recently by iBanerjee et al. 
(|2005h and iGneiting and Rafteivl (120071 1. So far as we know, their results have yet to 
be connected to the issue of dominance. 

To pull together the threads of earlier discussions, the present work offers a self- 
contained account of the relations among (i) coherent forecasts, (ii) Bregman diver- 
gences, and (iii) domination with respect to proper scoring rules. Only elementary 
analysis is presupposed. We begin by formalizing the concepts introduced aboveH 



The other classic vindication involves sure-loss contracts; see ISkvrmsl (|200Cn. 

For analysis of de Finetti's work, see IJovceL Il998l . Note that some authors use the term inadmis- 
sible to qualify dominated forecasts. 

3 For application of scor i ng ru les to the assessment of opinion, see IGneiting and Raftervl (|2007h along 
with iBernardo and Smith! ll 19941 . §2.7.2) and references cited there. 



3 



3 Framework and Main Result 



Let O be a nonempty sample space. Subsets of O are called events. Let £ be a 
vector (Ei, • • • , E n ) of n ^ 1 events over O. We assume that O and £ have been chosen 
and are now fixed for the remainder of the discussion. We require £ to have finite 
dimension n but otherwise our results hold for any choice of sample space and events. 
In particular, D. can be infinite. We rely on the usual notation [0,1], (0,1), {0,1} to 
denote, respectively, the closed interval {x : ^ x ^ 1}, the open interval {x : < x < 1} 
and the two-point set containing 0,1. 

Definition 1. Any element of [0, l] n is called a (probability) forecast (for £). A 

forecast f is coherent just in case there is a probability measure ix over D. such that 
for all i ^ n, f| = ix(Ei). 

A forecast is thus a list of n numbers drawn from the unit interval. They are interpreted 
as claims about the chances of the corresponding events in £. The first event in £ is 
assigned the probability given by the first number (fi) in f, and so forth. A forecast is 
coherent if it is consistent with some probability measure over D.. 

This brings us to scoring rules. In what follows, the numbers and 1 are used to 
represent falsity and truth, respectively. 

Definition 2. A function s : {0, 1} x [0, 1] — ► [0, oo] is said to be a proper scoring 
rule in case 

(a) ps(l,x) + (1 — p)s(0,x) is uniquely minimized at x = p for all p € [0, 1]. 

(b) s is continuous, meaning that for i € {0,1}, lim-r^oo s(i, x n ) = s(i, x) for any 
sequence x n £ [0, 1] converging to x. 

For condition Etjaj) , think of p as the probability you have in mind, and x as the one you 
announce. Then ps(l,x) + (1 — p)s(0,x) is your expected score. Fixing p (your genuine 
belief), the latter expression is a function of the announcement x. Proper scoring rules 
encourage candor by minimizing the expected score exactly when you announce p0 

The continuity condition is consistent with s assuming the value +oo. This can only 
occur for the arguments (0,1) or (1,0), representing categorically mistaken judgment. 
For if s(0,p) = oo for some p / 1, then ps(l,x) + (1 — p)s(0,x) can not have a unique 
minimum at x = p; similarly, s(l,p) < +oo for p = ^ 0. A typ ical example of an 
unbounded proper scoring r ule is s ( i, x) = — In ji — x| (|Goodl . fl95i ). A comparison of 
alternative rules is offered in lSelter] ( 19981 ). 



4 Some authors call such rules strictly proper. 
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For an event E, we let Ce be the characteristic function of E; that is, for all cu E O, 
Ce(cu) = 1 if cu € E and otherwise. Intuitively, Ce(cu) reports whether E is true or 
false if Nature chooses cu. 

Definition 3. Given proper scoring rule s, the penalty P s based on s for forecast f 
and cu € O is given by: 



Thus, P s sums the scores (conceived as penalties) for all the events under consideration. 
Henceforth, the proper scoring rule s is regarded as given and fixed. The theorem below 
holds for any choice we make. 

Definition 4. Let a forecast f be given. 

(a) f is weakly dominated by a forecast g in case P s (cu, g) ^ P s (cu,f) for all cu G CI. 

(b) f is strongly dominated by a forecast g in case P s (cu,g) < P s (cu,f) for all 
cu € O. 

Strong domination by a rival, coherent forecast g is the price to be paid for an incoherent 
forecast f . Indeed, we shall prove: 

Theorem 1. Let a forecast f be given. 

(a) If f is coherent then it is not weakly dominated by any forecast g / f. 

(b) If f is incoherent then it is strongly dominated by some coherent forecast g . 

Thus, if f and g are coherent and f / g then neither weakly dominates the other. 
The theorem follows from three propositions of independent interest, stated in the next 
section. We close the present section with a corollary. 

Corollary 1. A forecast f is weakly dominated by a forecast g ^ f if and only if f is 
strongly dominated by a coherent forecast. 

Proof of Corollary [IJ The right-to-left direction is immediate from Definition SJ For 
the left-to-right direction, suppose forecast f is weakly dominated by some g / f. Then 
by Theorem [TJa), f is not coherent. So by Theorem (Hh), f is strongly dominated by 
some coherent forecast. □ 




(1) 
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4 Three Propositions 



The first proposition is a characterization of coherence. It is due to lde FinettH (jl974h . 



Definition 5. Let V = {(Ce^cu), • • • , CE n (cu)) : cu G O} C {0, l} n . Let the cardinality 
of V be k. Let conv{V) be the convex hull of V, i.e., conv[V) consists of all vectors of 
form aiVi + • • • + a^Vk, where V| £ V, ^ 0, and Y-i=i a i = !• 

The Ei. may be related in various ways, so k < 2 n is possible (indeed, this is the case of 
interest). 

Proposition 1. A forecast f is coherent if and only if f G conu(V). 

The next proposition characterizes scoring rules in terms of convex functions. Recall 
that a convex function (p on a convex subset of 9t n satisfies (p(ax + (1 — a)y) ?S 
acp(x) + (1 — a)<p{y ) for all < a < 1 and all x, y in the subset. Strict convexity means 
that the ineq u ality is strict unless x = y. Variants of the follow i ng; fa ct are proved in 
Savage! (jl97ll ). iBaneriee et al.1 (|2005l ). and lGneiting and Raftervl (|2007h . 



Proposition 2. Let s be a proper scoring rule. Then the function (p : [0, 1] — > 9t 
defined by cp(x) = — xs(l,x) — (1 — x)s(0,x) is a bounded, continuous and strictly 
convex function, differentiable for x G (0, 1). Moreover, 



s(i,x) = -cp(x) - (p'(x)(i-x) Vx G (0,1) . 



(2) 



Conversely, if a function s satisfies ([2]), with cp bounded, strictly convex and differen- 
tiable on (0, 1), and s is continuous on [0, 1], then s is a proper scoring rule. 

We note that the right side of ([2]), which is only defined for x G (0,1), can be 
continuously extended to x = 0,1. This is the content of the Lemma [1] in the next 
section. If the extended s satisfies ([2]) then: 



s(0,0) = -<p(0) and s(l,l) 



-cp(l) 



(3) 



Finally, our third proposition con cerns a well known property of Bregman divergences 
(see, e.g., Censor and Zeniosl 1997 ). When we apply the proposition to the proof of 
Theorem [H C will be the unit cube in 9t n . 

Definition 6. Let C be a convex subset of lH n with non-empty interior. Let <P : 
C — > 9^ be a strictly convex function, differentiable in the interior of C, whose gradient 
V® extends to a bounded, continuous function on C. For x,y G C, the Bregman 
divergence do : C x C — > 9t corresponding to <P is given by 



d (y,x) =0(y)-(D(x)-V(D(x)-(y-x). 



6 



Because of the strict convexity of O, d® (ij,x) ^ with equality if and only if y = x. 

Proposition 3. Let d® : C x C -> SR be a Bregman divergence, and let Z C C be a 
closed convex subset of 9l n . For x G C \ Z, there exists a unique 7t x G Z, called the 
projection of x onto Z, such that 

d®(7tx,x) s; d®(y,x) Vy eZ. 

Moreover, 

d (y,7tx) < d (y,x) - d (7t x ,x) VugZ,xgC\Z. (4) 

Its worth observing that Proposition [3] also holds if x G Z, in which case 7t x = x and 
is trivially satisfied. 

5 Proof of Theorem [1] 

The main idea of the proof is more apparent when s is bounded. So we consider this 
case on its own before allowing s to reach +oo. 

Bounded Case. 

Suppose s is bounded. In this case, the derivative of the corresponding cp from Eq. (J2j) 
in Proposition [2] is continuous and bounded all the way up to the boundary of [0, 1]. 

Let f G [0, l] n be a forecast and, for tu G O, let v w G V be the vector with compo- 
nents C Et (cu). Let O(x) = 2Z^ =1 cpfo). Then 

n 

P s (cu,f) = 52s(C Et (a>),fi) [Definition [3] 

i=i 

n 

= J^-cptfO-v'tfOtCEJcuJ-fi) [Proposition^ 

i=i 

Tl 

= dptVq,,!) — } (p(C E Jcu)) [Definition [6] 

t=l 
ii 

= d (v a ,,f) + ^s(C Et (a)),C El (tu)) [EquationE]. (5) 

i=l 

Now assume that f is incoherent which, by Proposition [H means that f conv(V). 
According to Eq. ^) of Proposition [3l there exists a g G conv(V), namely the projection 
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of f onto conv{V), such that d®(y,g) ^ d®(y,f) — d®(g,f) for all y G conv{V) and 
hence, in particular, for y G V. Since d® (g,f) > this proves part (b) of Theorem [TJ 

To prove part (a) first note that weak dominance of f by g means that d®^^, g) ^ 
d®(Vo,,f) for all G V, by Eq. ©. In this case, d®(y,g) ^ d®(y,f) for all y G 
conv{V), since d® {y , g) — d® [y , f ) depends linearly on y . If f is coherent, f G conv{V) 
by Proposition [U and hence d®(f, g) ^ d®(f, f) = 0. This implies that g = f . 

Unbounded Case. 

Next, consider the case when s is unbounded. In this case, the derivative of the 
corresponding (p from Proposition [2] diverges either at or 1, or at both values, and 
hence we can not directly apply Proposition O Eq. ([5|) is still valid, with both sides of 
the equation possibly being +oo. However, if f lies either in the interior of [0, l] n , or 
on a point on the boundary where the derivative of ®(x) = (p(*i) does not diverge, 
an examination of the proof of Proposition [3] shows that the result still applies, as we 
show now. 

If VO(f) is finite, the minimum of ®(y) — V<D(f) • y over y G conv{V) is uniquely 
attained at some g G conv(V). Moreover, VO(g) is necessarily finite. Repeating the 
argument in the proof of Proposition [3] shows that d® [y, g) ^ d®(y,f) — d®(g,f) for 
any y G conv(V), which is the desired inequality needed in the proof of Theorem [2b). 
We are thus left with the case in which f lies on an (u — 1) dimensional face of [0, l] n 
where the normal derivative diverges. Consider first the case n = 1. Then either 
V = {0, 1}, in which case f is coherent, or V = {0} or {1}, in which case it is clear that 
the unique coherent vector g G V strongly dominates f . 

We now proceed by induction on the dimension n of the forecast f . In the (u — 1) 
dimensional hypercube, either f lies inside or on a point of the boundary where the 
normal derivative of O is finite, in which case we have just argued that there exists a 
g that is coherent and satisfies P s (cu,g) < P s (u>,f) for all w such that Vfx) lies in the 
(u — 1) dimensional face. In the other case, the induction hypothesis implies that we 
can find such a g. Note that for all the other tu, P s (cu, g) = P s (cu,f) = oo. Now simply 
pick an < e < 1 and choose g e = (1 — e)g + el -1 Y.i=i v i' where the V| denote all 
the 1 elements of V outside the (u — l)-dimensional hypercube. Then P s (cu, g e ) < oo 
for all cu and also, using Lemma [Q lim e ^o Ps(tu> 9e] = (cu, g ). Hence we can choose 
e small enough to conclude that P s (tu,g e ) < P s (cu,f) for all o> 6 Q. This finishes the 
proof of part (b) in the general case of unbounded s. 

To prove part (a) in the general case, we note that if f = a i v i f° r v i S V and 
ch > 0, then necessarily d® (vt, f ) < oo. That is, any coherent f is a convex combination 
of Vt G V such that d®(vi,f) < oo. This follows from the fact that a component of f 



8 



can be only if this component is for all the Vi's. The same is true for the value 1. 
But the do (v, f ) can be infinite only if some component of f is and the corresponding 
one for v is 1, or vice versa. 

Since do(Vi,f) < oo for the in question, also do(Vi, g) < oo by Eq. ([5]) and the 
assumption that f is weakly dominated by g. Moreover, do (V|, g) — do (V|, f ) ^ 0. But 
£ t ai(d (vt, g) - do(v i5 f)) = d (f, g) ^ 0, hence f = g. □ 

6 Proofs of Propositions [TH3] 

Proof of Proposition^ Recall that n is the dimension of £, and that k is the number 
of elements in V. Let X be the collection of all nonempty sets of form p)ILi ^i' wri ere 
E? is either Et or its complement. (X corresponds to the minimal non-empty regions 
appearing in the Venn diagram of £.) It is easy to see that: 

(a) X partitions O. 

It is also clear that there is a one-to-one correspondence between X and V with the 
property that e € X is mapped to v € V such that for all i ^ n, e C E| iff Vi = 1. (Here, 
Vt denotes the ith component of v.) Thus, there are k elements in X. We enumerate 
them as ei, • • • , eic, and the corresponding v by v(ej). Plainly, for all i ^ u, E^ is the 
disjoint union of {ej : ) ^ k A v(e^)t = 1}, and hence: 

(b) For any measure u , u(EJ = Y-\=i M-( e jM e j)i for all 1 < i ^ n. 

For the left-to-right direction of the proposition, suppose that forecast f is coherent 
via probability measure u. Then f| = u(E|) for all i ^ n and hence by fbj), = 
^^=1 M-(2j)v(ej)i. But the u(ej) are non-negative and sum to one by (jaj), which shows 
that f £ conv{V). 

For the converse, suppose that f £ conv{V), which means that there are non-negative 
dj's, with Y.) a j = 1) such that f = Y^]=i a j v ( e j)- Let u be some probability measure 
such that (x(ej) = a, for all j ^ k. By (jlj) and the assumption about the at, it is clear 
that such a measure u exists. For all i ^ n, ft = Y-\=i a j v ( e j)i = Y-f=i M-( e j) v ( e j)i = 
u(Ei) by (|b|), thereby exhibiting f as coherent. □ 

Before giving the proof of Proposition O we state and prove the following technical 
Lemma. 
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Lemma 1. Let cp : [0, 1] — > 9t be bounded, convex and differentiable on (0,1). Then 
the limits linip^o,! <p(p) and lim p ^o,i <p'(p) exist, the latter possibly being equal to 
— oo at x = or +oo at x = 1. Moreover, 

lim pcp'(p) = lim (p'(p)(l-p) =0. (6) 
p^O p->i 

Proof of LemmaUl Since cp is convex, the limits lim p ^o,i <p(p) exist, and they are finite 
since cp is bounded. Moreover, cp' is a monotone increasing function, and hence also 
lim p ^o,i <p'(p) exists (but possibly equals — oo at x = or +oo at x = 1). Finally, 
Eq. © follows again from monotonicity of cp' and boundedness of cp, using that = 
linip^o Jo V'W&Q ^ Ihnp^oPtp'tp)) an d likewise at p = 1. □ 

Proof of Proposition^ Let s be a proper scoring rule. For < p < 1, let 

cp(p) =-min{ps(l,x) + (l-p)s(0,x)} . (7) 

By Definition Ojaj) , the minimum in (J7J is achieved at x = p, hence cp(p) = — ps(l,p) — 
(l-p)s(0,p). 

As a minimum over linear functions, — cp is concave; hence cp is convex. Clearly, cp 
is bounded (because s ^ implies, from ([7]), that cp ^ 0, but a convex function can 
become unbounded only by going to +oo). 

The fact that the minimum is achieved uniquely (Def. [2|) implies that cp is strictly 
convex for the following reason. We take x, y € (0, 1) and < a < 1 and set z = 
ax+(l-a)y. Then <p[y) = -y s[l,y) - (1 --y) s(0,y] > -y s(l,x) - (1 -y) s{0,z) by 
uniqueness of the minimizer at y 7^ z. Similarly, cp(x) = — xs(l,x) — (1 — x) s(0, x) > 
— xs(l,z) — (1 — x) s(0, z). By adding a times the first inequality to 1 — a times the 
second we obtain cup(y) + (1 — a)cp(x) > — zs(l,z) — (1 — z)s(0,z) = cp(z), which is 
precisely the statement of strict convexity. 

Let 4>(p) = s(0,p) — s(l,p). If cp is differentiable and cp'(p) = iMp) f° r all < p < 1, 
then ([2]) is satisfied, as simple algebra shows. 

We shall now show that cp is, in fact, differentiable and cp' = i[>. For any p G (0, 1) 
and small enough e, we have 

- MP + e) - cp(p)) =Mv) 
e 

- - [(p + e) (s(l,p + e) - s(l,p)) + (1 -p - e) (s(0,p + e) - s(0,p))] . 
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Since (p + e)s(l,x) + (1 — p — e)s(0,x) is minimized at x = p + e by Definition 
the last term in square brackets is negative. Hence 

lim - (cp(p + e) - cp(p)) > i|>(p) , 
e^O e 

and similarly one shows 

lim - (cp(p) - cp(p - e)) < Mv) ■ 

e^O e 

Since \1> is continuous by Definition Eljbj) , this shows that cp is differentiable, and hence 
il> = cp'. This proves Eq. ^j. Continuity of cp up to the boundary of [0, 1] follows from 
continuity of s and Lemma [TJ 

To prove the converse, first note that if cp is bounded and convex on (0,1), it can 
be extended to a continuous function on [0, 1], as shown in Lemma [TJ Because of strict 
convexity of cp we have, for p G [0, 1] and < x < 1, 

ps(l,x) + (l-p)s(0,x) = -<p(x)-<p'(x)(p-x) >-<p(p), (8) 

with equality if and only if x = p . 

It remains to show that the same is true for x E {0, 1}. Consider first the case x = 0. 
We have to show that ps(l,0) + (1 — p)s(0, 0) > — cp(p) for p > 0. By continuity of s, 
Eq. (j2j) and LemmaCO we have s(l, 0) = — cp(0) — lim p ^o tp'(p) 5 while s(0, 0) = — <p(0). 
If linip^o cp'(p) = — oo, the result is immediate. If cp'(0) := lim p ^i cp'(p) is finite, we 
have — <p(0) — pcp'(O) > — cp(p) again by strict convexity of cp. 

Likewise, one shows that ps(l, 1) + (1 — p)s(0, 1) > — cp(p) for p < 1. This finishes 
the proof that s is a proper scoring rule. □ 

Proof of Proposition^ For fixed x E C, the function y d®(y,x) is strictly convex, 
and hence achieves a unique minimum at a point 7t x in the convex, closed set Z. 

Let yeZ. For ^ e ^ 1, (1 — e)7t x + ey e Z, and hence d®((l — e)7t x + ey,x) — 
da>(7t x ,x) ^ by the definition of 7t x . Since d® is differentiable in the first argument, 
we can divide by e and let e — > to obtain 

^ lim - (do((l - e)7t x + ey,x) - d® (7t x ,x)) = (V<D(7t x ) - VO(x)] • {y - n x ) . 
e^0 e 

The fact that 

d<i,(y,x) - dd,(7t x ,x) - d (y,7t x ) = (V<D(7t x ) - V<D(x)) -{y-n x ) 
proves the claim. □ 
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7 Generalizations 



7.1 Penalty functions 

Theorem [T] holds for a larger class of penalty functions. In fact, one can use different 
proper scoring rules for every event, and replace (pQ) by 



where the are possibly distinct proper scoring rules. In this way, forecasts for some 
events can be penalized differently than others. The relevant Bregman divergence in this 
case is given by ®(x) = (Pi(xt), where cpt is determined by via ([2]). Proof of this 
generalization closely follows the argument given above, so it is omitted. Additionally, 
by considering more general convex functions O our argument generalizes to certain 
non-additive penalties. 

7.2 Generalized scoring rules 

7.2.1 Non-uniqueness 

If one relaxes the condition of unique minimization in Definition [2jaj) , a weaker form 
of Theorem Q] still holds. Namely, for any incoherent forecast f there exists a coherent 
forecast g that weakly dominates f . Strong dominance will not hold in general, as the 
example of s(i, x) = shows. 

Proposition [2] also holds in this generalized case, but the function cp need not be 
strictly convex. Likewise, Proposition [3] can be generalized to merely convex (not neces- 
sarily strictly convex) <D but in this case the projection 7t x need not be unique. Eq. dH) 
remains valid. 

7.2.2 Discontinuity 

A generalization that is more interesting mathematically is to discontinuous scoring 
rules. Proposition [2] can be generalized to scoring rules that satisfy neither the con- 
tinuity condition in Definit ion [2] nor unique minimization. (This is also shown in 



P.(a>,f) = ^s l (C Et (a)),f l ) 




Proposition 4. Let s : {0, 1} x [0, 1] — > [0, oo] satisfy 



ps(l,x) + (l-p)s(0,x) >ps(l,p) + (l-p)s(0,p) Vx,p € [0,1]. 



(9) 
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Then the function cp : [0, 1] i — > defined by cp(x) = — xs(l, x) — (1 — x)s(0, x) is bounded 
and convex. Moreover, there exists a monotone non-decreasing function \\> : [0, 1] i— ► 
91 U {±oo}, with the property that 

i)j(x) > lim -(cp(x)-cp(x-e)) Vx G (0, 1] , (10) 
e^O e 

a|)(x) < lim -(cp(x+e)-cp(x)) Vxg[0,1), (11) 
e^O e 

such that 

s(i,x) =-cp(x) -i|j(x)(i-x) VxG(0, 1). (12) 

Function cp is strictly convex if and only if the inequality ([9]) is strict for x/p. 

Conversely if s is of the form (]12p . with cp bounded and convex and i|j satisfying 
(HO]) dUD, then s satisfies ©. 



It is a fact ( Hardy et al. . 19341 ) that every convex function cp on [0, 1] is continuous on 



(0, 1) and has a right and left derivative, i|jr and 4>j_ (defined by the right sides of (fTT|) 
and (|10p . respectively) at every point (except the endpoints, where it has only a right 
or left derivative, respectively). Both i[>r and il>i_ are non-decreasing functions, and 
4>lM < 4>rM for all x G (0, 1). Except for countably many points, ipi_(x) = 4>r(x), 
i.e., cp is differentiable. Eqs. (fT0l) - (fTT]) say that 4>l(x) < i\>{x) ^ i^rM- 

Note that although s(0,x) and s(l,x) may be discontinuous, the combination cp(x) = 
— xs(l,x) — (1 — x)s(0,x) is continuous. Hence, if s(0,x) jumps up at a point x, s(l,x) 
has to jump down by an amount proportional to (1 — x)/x. 

The proof of Proposition H] is virtually the same as the proof of Proposition [21 so we 
omit it. 



7.3 Open question 

Whether Theorem Q] holds for this generalized notion of a discontinuous scoring rule 
remains open. The proof of Theorem Q] given here does not extend to the discontinuous 
case, since for inequality (|4|) to hold, differentiability of O is necessary, in general. 
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